Fix 7180 autodetect #7691

bjonen · 2014-07-08T08:48:52Z

This PR closes #7180

In the terminal:
If maxcols == 0 then auto detect columns
if maxrows == 0 auto detect rows

bjonen · 2014-07-15T07:14:52Z

Are you ok with updating the docs as follows? @jorisvandenbossche @jreback

display.max_columns
If max_cols is exceeded, switch to truncate view. 
Depending on `large_repr` objects are either 
centrally truncated or printed as a summary view. 

In case python/IPython is running in a terminal and 
`large_repr` == truncate this can be set to 0 and pandas will 
correctly auto-detect the width of the terminal and print a 
truncated object which fits the screen width. 
The IPython notebook, IPython qtconsole, 
or IDLE do not run in a terminal and hence it is 
not possible to do correct auto-detection. 
'None' value means unlimited.

display.max_rows:
See display.max_columns

jreback · 2014-07-15T12:38:26Z

is 0 different than None?

bjonen · 2014-07-16T07:44:37Z

I think it used to be different. Actually I have to check whether the current implementation respects that. You think it's confusing to have 0 and None?

jreback · 2014-07-16T11:46:35Z

ok it appears that both max_rows/max_columns accept None. can you see:

have test for 0 and None yield the same (and if they don't, pls post). This AFAICR should not be the case, but needs checking

armaganthis3 · 2014-07-16T17:19:20Z

This was meant to fix a regression, where width auto-detection no longer works.
@jreback , you're asking #bjonen to change established behaviour (None means unlimited).
The only reason seems like your own aesthetic preference that 0 and None should be synonymous. they are not. Not in type and not in established meaning.

Why constantly introduce unnecessary changes to existing behavior? there's no reason.
You didn't even read the docstring @bjonen was asking about.

xref the discussion in #7750

cc @jorisvandenbossche

jreback · 2014-07-16T17:24:40Z

@armaganthis3 I guess you are not reading my comments.

I asked @bjonen to confirm what the functionaility is. I don't really recall what its supposed to be.

Of course 0 and None should, all other things being equal, have the same meaning. Unless their is REALLY REALLY good reason.

Why don't you do some research and post it here. That would actually be helpful.

armaganthis3 · 2014-07-19T13:37:40Z

have test for 0 and None yield the same (and if they don't, pls post). This AFAICR should not be the case, but needs checking

You asked him to add a test to ensure they behave the same, and you did not read the docstring
he posted above, which explicitly states they are different. an example of carelessness.

Of course 0 and None should, all other things being equal, have the same meaning.

Utter nonsense. You are making stuff up and then proclaiming them as rules. It's a minor point,
but it's another example of so many others in #7750 that it's become disturbing.

bjonen · 2014-07-20T16:59:43Z

Ok, both before and in the current PR, None is unlimited (no truncation, wrap around instead).

jorisvandenbossche · 2014-07-28T20:35:27Z

@bjonen Your updated description in the comment above looks OK. Can you update the PR for this?

A question: what means 'autodetect' in the case of rows?

bjonen · 2014-07-29T15:40:36Z

Sure I can do that.

Autodetect for rows:
We can get the terminal height which corresponds to the number of rows that can be displayed.

(w,h) = get_terminal_size()

There are two parameters:

row_fac = 0.85  # Fraction of row space to fill on row_num == 0
col_fac = 0.90  # Fraction of screen width to fill on col_num == 0

jorisvandenbossche · 2014-07-29T18:24:27Z

Thanks for the clarification!

bjonen · 2014-09-06T16:39:24Z

Added docs and rebased. Ready from my side.

jreback · 2014-09-08T13:51:02Z

looks fine, @jorisvandenbossche ?

jreback · 2014-09-08T13:52:03Z

@bjonen actually, can you add a release note for this in v0.15.0. Is a bigger notice needed? (its sort fo an edge case, thinking no)

jorisvandenbossche · 2014-09-08T13:55:49Z

pandas/core/config_init.py

-    for a dataframe prints out fully or just a summary repr.
-    'None' value means unlimited.
+    If max_rows is exceeded, switch to truncate view. View
+    display.max_columns for details.


I would leave 'real' explanation of what max_rows does also here, instead of referring to max_cols (I think it is annoying if you do pd.describe_option('max_rows) and you don't get the answer).

jorisvandenbossche · 2014-09-08T14:02:38Z

@bjonen I added still some comments on the docstring (sorry for not doing this before you updated and rebased)

I think this is a bug fix? (and so should go in that section?)

jorisvandenbossche · 2014-09-08T14:14:44Z

Something else, I tested it, and it doesn't seem to work correctly for me (tested it on Windows)

On the standard size (width of 80), it gives one column too much (and overflows). If I resize the terminal, then it works (eg with width of 75).
Also, if I use this option, there seems to be a lag of almost a second for displaying a DataFrame:

In [20]: df = pd.DataFrame(np.random.randn(100,1000))
In [21]: pd.options.display.max_rows = 10
In [22]: pd.options.display.max_columns = 10
In [23]: %timeit df.__repr__()
100 loops, best of 3: 12 ms per loop

In [24]: pd.options.display.max_rows = 0
In [25]: pd.options.display.max_columns = 0
In [26]: %timeit df.__repr__()
1 loops, best of 3: 3.15 s per loop

And get_terminal_size is not the culprit:

In [28]: %timeit get_terminal_size()
10000 loops, best of 3: 28.7 µs per loop

bjonen · 2014-09-08T21:53:34Z

@jorisvandenbossche Thanks for looking at this PR.

Overflow issue: I should be subtracting absolute values from the width/height instead of a percentage value. The row count at the bottom, for example, takes the same number of rows independent of the w/h. I fixed it for rows and it works when I resize the terminal on Ubuntu. For columns, there is indeed a problem. I have to investigate this further.
Speed issue: I tested this with (100,100) frames and the speed is ok. This indeed gets very slow for very large dfs. The reason is that I generate the string for the whole frame. Then I know the size of the elements and can cut as many columns as needed to fit to the desired width. I went this route because it was fast to implement.
Alternatively one could add columns left and right until the screen is filled. This way one only has to format columns that are ultimately printed.

bjonen · 2014-09-09T17:03:30Z

If you feel the speed issue is a deal breaker, I can attempt an alternative implementation. But this will take some time.

If you want to get this into the next release, I could take care of the overflow issue and then mention that this display gets slow for dfs greater than (100,100).

jreback · 2014-09-09T17:05:27Z

@bjonen you can check up front whether the size is an issue (e.g. > 100x100)?
if so, then just take some bigger size (say 200x200) and do your calcs (where you are reducing rows and such). then don't ever have to deal with a really big size. (slight downside is that in theory in may not fill up the space, but you can make a good guess I think)

bjonen · 2014-09-10T00:15:10Z

@jreback Following your suggestion the speed improves quite a bit. auto-detection is still factor 10 slower though:

Without:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.random.randn(1000,1000))
In [4]: %timeit df.repr()
10 loops, best of 3: 40.3 ms per loop

And with auto detection:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.random.randn(1000,1000))
In [4]: pd.options.display.max_columns = 0
In [5]: pd.options.display.max_rows = 0
In [6]: %timeit df.repr()
1 loops, best of 3: 343 ms per loop

I fixed the overflow issue on my machine. It would be great if someone could check it (especially on windows). Also updated the docs and rebased. Thanks for the help guys!

jreback · 2014-09-10T00:19:15Z

this is only for max_columns=o right? (eg auto detection is on)

bjonen · 2014-09-10T00:24:37Z

Yes, continuing the same session:

In [7]: pd.options.display.max_rows = 20
In [8]: pd.options.display.max_columns = 20
In [9]: %timeit df.repr()
10 loops, best of 3: 19.9 ms per loop

bjonen · 2014-09-10T00:32:22Z

However, just found these two cases are broken:

max_cols == 0 and max_rows != 0
max_cols != 0 and max_rows == 0

bjonen · 2014-09-11T19:58:14Z

Fixed remaining issues and added line in release note.

bjonen · 2014-09-11T20:36:09Z

Checked in the notebook and the truncation seems unaffected by the changes.

@jreback @jorisvandenbossche
When you have time, could you please test the auto-resizing on your machines? On windows you might have to set widht==0.

jorisvandenbossche · 2014-09-11T20:44:05Z

@bjonen what do you mean with "have to set width==0"?

bjonen · 2014-09-11T23:05:32Z

Sorry, that was not very clear. I meant pd.options.display.width. I set this to None in 0.14 to infer the size of the terminal instead of the default 80.

However, I checked briefly and It shouldn't be necessary to adjust this value to get the df displayed across the entire terminal when max_columns = 0.

jorisvandenbossche · 2014-09-17T18:44:23Z

@bjonen I can confirm that it now works as expected on Windows

It is also quicker, although you notice it is a bit slower than normal (it is not 'irritating' anymore, but if it would be easily possible to make it as fast as the normal display, that would be a plus I think, but maybe not for this PR)

jorisvandenbossche · 2014-09-17T19:17:26Z

But I noticed something else in the printing, not necessarily related to this (it occurs both in max_columns=0 case as default parameter).
So the columns names in the 'left' side from the truncation are misaligned by one space:

In [33]: df = pd.DataFrame(np.random.randn(10,30))

In [34]: df
Out[34]:
         0         1         2         3         4         5         6   \
0 -0.408156  0.416667  0.627905  0.393364 -1.662219 -0.601590 -0.553563
1 -0.447229 -2.427161 -0.663929  0.634625 -0.765336  0.193455  0.326905
2  1.412678 -0.209811 -0.692369 -0.593017 -1.090450  1.132212  0.665779
3  0.748226  0.565255  0.500697 -1.666073  0.378457  0.701584 -1.593952
4 -0.156511  1.810054 -0.566796  0.763574 -1.315514 -0.250945 -0.672395
5  0.857299 -0.153263 -1.718789  0.930296  0.096889  0.262306 -0.419779
6 -1.547626 -0.886810 -0.488989  1.136803 -0.988654 -0.069571 -0.010908
7 -0.396801 -1.282552  1.528441  1.119888  0.374016  0.072011 -0.496032
8  0.466879 -1.849309 -1.668141  0.260994  0.195670  0.983986 -0.659583
9  0.271477  1.455909 -0.949029  1.563097 -0.529267 -0.445536  0.465593

         7         8         9     ...           20        21        22  \
0 -0.027451 -1.492445  0.236370    ...    -0.207124  0.942174  0.993658
1  0.395174 -0.168197 -1.092535    ...     0.870743  0.466931 -0.650112
2  0.012762 -0.145582 -0.857474    ...     1.518490 -1.223810 -0.117409
3 -1.651538  0.546825  0.016696    ...     0.656450  1.289500  0.513666
4 -0.413658  0.923769  0.583012    ...     1.298192 -1.038987  0.738511
5 -0.784654  0.129776 -0.911785    ...    -1.032633  0.235149  0.374142
6  0.917978  1.120702  0.723245    ...    -0.915931 -1.003466  0.432385
7 -0.257428 -0.460610  2.318658    ...     0.430248 -2.039238  1.090794
8  0.438255  0.060507  0.290638    ...    -1.294980 -0.695728  0.078912
9  0.592629  0.159337  1.215215    ...     0.991044  0.330678  0.536602

         23        24        25        26        27        28        29
0  0.040387  0.478344  0.754135 -0.025597  1.516723  0.585786  1.367364
1  0.535671 -0.532744 -0.010694 -0.323556  0.424386 -0.916463  0.309135
2  0.032086  0.359642  0.150719  0.482848 -0.115011  1.714101 -0.604355
3  3.558434 -1.161022  0.973159  0.979117 -0.194935 -0.162460  0.199642
4 -0.193967  0.400109 -0.047031  0.181616  0.533648  0.040881  1.456876
5  1.765558  1.216659  0.506202  0.252224  0.907562 -0.645786 -0.097129
6  1.529280 -0.656682 -0.577339 -0.668127 -0.031473  1.649144 -0.466511
7 -0.137917 -0.228749  0.211181 -0.378891 -1.412424 -1.246940  1.351121
8  0.840317  0.993658  0.627771  1.125502  0.051515 -0.050630  0.474245
9  0.640976  0.954312  2.334820 -0.014047  0.472212 -0.684699 -0.994627

[10 rows x 30 columns]

Update: opened a seperate issue about this, also occurs on master.

bjonen · 2014-09-17T21:54:20Z

@jorisvandenbossche Thanks for testing the PR! I'm glad to hear it's working on Windows as well.

bjonen · 2014-09-18T07:25:03Z

Regarding the speed issue: Right now, I don't see a quick fix. So I would do this in a separate PR.

jreback · 2014-09-18T13:20:14Z

looks fine to me.just needs a rebase.

This is simply a bug fix, so looks ok, unless @jorisvandenbossche thinks should be more prominent in release notes.

jorisvandenbossche · 2014-09-18T13:23:08Z

no, looks ok

jreback · 2014-09-18T13:55:33Z

merged via ccb196f

thanks @bjonen

jreback added Bug labels Jul 8, 2014

jreback added this to the 0.15.0 milestone Jul 8, 2014

bjonen force-pushed the fix_7180_autodetect branch 2 times, most recently from 8003764 to 2537c4d Compare September 6, 2014 16:38

jorisvandenbossche reviewed Sep 8, 2014
View reviewed changes

bjonen force-pushed the fix_7180_autodetect branch from 3fec539 to 7d027e0 Compare September 10, 2014 00:09

BUG: Auto-detect terminal size on max_col/max_row==0

1f67279

bjonen force-pushed the fix_7180_autodetect branch from 7d027e0 to 1f67279 Compare September 11, 2014 19:56

bjonen mentioned this pull request Sep 17, 2014

Incorrect truncating of Series according to max_rows #7508

Closed

jreback closed this Sep 18, 2014

jreback mentioned this pull request Sep 18, 2014

max_columns == 0 incorrectly wraps around for wide dfs #7180

Closed

Fix 7180 autodetect #7691

Fix 7180 autodetect #7691

Conversation

bjonen commented Jul 8, 2014

bjonen commented Jul 15, 2014

jreback commented Jul 15, 2014

bjonen commented Jul 16, 2014

jreback commented Jul 16, 2014

armaganthis3 commented Jul 16, 2014

jreback commented Jul 16, 2014

armaganthis3 commented Jul 19, 2014

bjonen commented Jul 20, 2014

jorisvandenbossche commented Jul 28, 2014

bjonen commented Jul 29, 2014

jorisvandenbossche commented Jul 29, 2014

bjonen commented Sep 6, 2014

jreback commented Sep 8, 2014

jreback commented Sep 8, 2014

jorisvandenbossche Sep 8, 2014

Choose a reason for hiding this comment

jorisvandenbossche commented Sep 8, 2014

jorisvandenbossche commented Sep 8, 2014

bjonen commented Sep 8, 2014

bjonen commented Sep 9, 2014

jreback commented Sep 9, 2014

bjonen commented Sep 10, 2014

jreback commented Sep 10, 2014

bjonen commented Sep 10, 2014

bjonen commented Sep 10, 2014

bjonen commented Sep 11, 2014

bjonen commented Sep 11, 2014

jorisvandenbossche commented Sep 11, 2014

bjonen commented Sep 11, 2014

jorisvandenbossche commented Sep 17, 2014

jorisvandenbossche commented Sep 17, 2014

bjonen commented Sep 17, 2014

bjonen commented Sep 18, 2014

jreback commented Sep 18, 2014

jorisvandenbossche commented Sep 18, 2014

jreback commented Sep 18, 2014